setwd("/Users/yameili/Desktop/ecob2000_lecture1/acs2017_ny")
load("acs2017_ny_data.Rdata")
attach(acs2017_ny)
use_varb <- (AGE >= 25) & (AGE <= 55) & (LABFORCE == 2) & (WKSWORK2 > 4) & (UHRSWORK >= 35)
dat_use <- subset(acs2017_ny,use_varb) # 
detach()
attach(dat_use)

This selects people 25-55 (often called prime age), in labor force, working year round, fulltime, We’ll figure out what factors affect income wages, first, we guess these factors can influence the income wages, and then we build a model to test.

model_temp1 <- lm(INCWAGE ~ AGE + female + AfAm + Asian + Amindian + race_oth + Hispanic + educ_hs + educ_somecoll + educ_college + educ_advdeg)
summary(model_temp1)
## 
## Call:
## lm(formula = INCWAGE ~ AGE + female + AfAm + Asian + Amindian + 
##     race_oth + Hispanic + educ_hs + educ_somecoll + educ_college + 
##     educ_advdeg)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -148088  -33205  -10708   13053  625543 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -7096.25    2446.71  -2.900 0.003730 ** 
## AGE             1316.69      39.66  33.199  < 2e-16 ***
## female        -24939.46     720.43 -34.617  < 2e-16 ***
## AfAm          -11934.26    1130.37 -10.558  < 2e-16 ***
## Asian            566.53    1369.83   0.414 0.679188    
## Amindian       -8858.57    6077.71  -1.458 0.144971    
## race_oth       -7526.49    1272.49  -5.915 3.35e-09 ***
## Hispanic       -4224.82    1183.47  -3.570 0.000358 ***
## educ_hs        10592.37    1814.71   5.837 5.35e-09 ***
## educ_somecoll  22461.39    1857.67  12.091  < 2e-16 ***
## educ_college   57155.71    1830.96  31.216  < 2e-16 ***
## educ_advdeg    82766.43    1878.64  44.057  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 76760 on 46959 degrees of freedom
## Multiple R-squared:   0.15,  Adjusted R-squared:  0.1498 
## F-statistic: 753.6 on 11 and 46959 DF,  p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(model_temp1,col="blue",pch=16,cex=1,lwd=1,lty=2)

require(stargazer)
## Loading required package: stargazer
## 
## Please cite as:
##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
stargazer(model_temp1, type = "text")
## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                               INCWAGE          
## -----------------------------------------------
## AGE                        1,316.691***        
##                              (39.661)          
##                                                
## female                    -24,939.460***       
##                              (720.433)         
##                                                
## AfAm                      -11,934.250***       
##                             (1,130.372)        
##                                                
## Asian                         566.528          
##                             (1,369.834)        
##                                                
## Amindian                    -8,858.569         
##                             (6,077.710)        
##                                                
## race_oth                   -7,526.487***       
##                             (1,272.485)        
##                                                
## Hispanic                   -4,224.816***       
##                             (1,183.469)        
##                                                
## educ_hs                    10,592.370***       
##                             (1,814.709)        
##                                                
## educ_somecoll              22,461.390***       
##                             (1,857.674)        
##                                                
## educ_college               57,155.710***       
##                             (1,830.963)        
##                                                
## educ_advdeg                82,766.430***       
##                             (1,878.638)        
##                                                
## Constant                   -7,096.252***       
##                             (2,446.712)        
##                                                
## -----------------------------------------------
## Observations                  46,971           
## R2                             0.150           
## Adjusted R2                    0.150           
## Residual Std. Error   76,755.980 (df = 46959)  
## F Statistic         753.551*** (df = 11; 46959)
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

from the graph, we can see that the p-value of Asian is 0.679188 and the p-value of Amindian is 0.144971,We can infer that the regression coefficients of these two variables are not significant, so we need to consider whether these two variables are suitable for our model. Besides,the Multiple R-squared is 0.15,This model can only explain the variance of 15%, so we need to optimize it.

Model optimization we need to draw the scatter graph between incomewage and Amindian,incomewageand Asian.

nasian<-as.numeric(as.character(dat_use$INCWAGE))
par(mfrow=c(2,2))
Wage_asian<-lm(INCWAGE~Asian)
plot(Wage_asian,col="purple",pch=14,cex=1,lwd=1,lty=2)

summary(Wage_asian)
## 
## Call:
## lm(formula = INCWAGE ~ Asian)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -76998 -39981 -19981  12019 566019 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  71980.6      404.6 177.913  < 2e-16 ***
## Asian         5017.1     1286.1   3.901  9.6e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 83230 on 46969 degrees of freedom
## Multiple R-squared:  0.0003239,  Adjusted R-squared:  0.0003026 
## F-statistic: 15.22 on 1 and 46969 DF,  p-value: 9.598e-05
nasian<-as.numeric(as.character(dat_use$INCWAGE))
par(mfrow=c(2,2))
Wage_amindian<-lm(INCWAGE~Amindian)
plot(Wage_amindian,col="purple",pch=14,cex=1,lwd=1,lty=2)

summary(Wage_amindian)
## 
## Call:
## lm(formula = INCWAGE ~ Amindian)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -72553 -40553 -20553  12447 587481 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  72552.5      384.7 188.586   <2e-16 ***
## Amindian    -22033.3     6571.2  -3.353    8e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 83240 on 46969 degrees of freedom
## Multiple R-squared:  0.0002393,  Adjusted R-squared:  0.000218 
## F-statistic: 11.24 on 1 and 46969 DF,  p-value: 8e-04

According to the graph, there is no direct relationship between these two variables and incomewage, we should remove these two variables from the model1, and then carry out a stepwise regression,

step(model_temp1)
## Start:  AIC=1056708
## INCWAGE ~ AGE + female + AfAm + Asian + Amindian + race_oth + 
##     Hispanic + educ_hs + educ_somecoll + educ_college + educ_advdeg
## 
##                 Df  Sum of Sq        RSS     AIC
## - Asian          1 1.0077e+09 2.7666e+14 1056706
## <none>                        2.7666e+14 1056708
## - Amindian       1 1.2516e+10 2.7667e+14 1056708
## - Hispanic       1 7.5080e+10 2.7673e+14 1056719
## - educ_hs        1 2.0072e+11 2.7686e+14 1056740
## - race_oth       1 2.0611e+11 2.7686e+14 1056741
## - AfAm           1 6.5671e+11 2.7731e+14 1056817
## - educ_somecoll  1 8.6131e+11 2.7752e+14 1056852
## - educ_college   1 5.7410e+12 2.8240e+14 1057671
## - AGE            1 6.4932e+12 2.8315e+14 1057796
## - female         1 7.0601e+12 2.8372e+14 1057890
## - educ_advdeg    1 1.1435e+13 2.8809e+14 1058608
## 
## Step:  AIC=1056706
## INCWAGE ~ AGE + female + AfAm + Amindian + race_oth + Hispanic + 
##     educ_hs + educ_somecoll + educ_college + educ_advdeg
## 
##                 Df  Sum of Sq        RSS     AIC
## <none>                        2.7666e+14 1056706
## - Amindian       1 1.2451e+10 2.7667e+14 1056706
## - Hispanic       1 8.9401e+10 2.7675e+14 1056719
## - educ_hs        1 1.9974e+11 2.7686e+14 1056738
## - race_oth       1 2.4610e+11 2.7691e+14 1056746
## - AfAm           1 6.6303e+11 2.7732e+14 1056817
## - educ_somecoll  1 8.6128e+11 2.7752e+14 1056850
## - educ_college   1 5.7430e+12 2.8240e+14 1057669
## - AGE            1 6.4922e+12 2.8315e+14 1057794
## - female         1 7.0610e+12 2.8372e+14 1057888
## - educ_advdeg    1 1.1441e+13 2.8810e+14 1058607
## 
## Call:
## lm(formula = INCWAGE ~ AGE + female + AfAm + Amindian + race_oth + 
##     Hispanic + educ_hs + educ_somecoll + educ_college + educ_advdeg)
## 
## Coefficients:
##   (Intercept)            AGE         female           AfAm       Amindian  
##         -7004           1316         -24941         -11965          -8835  
##      race_oth       Hispanic        educ_hs  educ_somecoll   educ_college  
##         -7282          -4378          10547          22409          57128  
##   educ_advdeg  
##         82740

When we did the stepwise regression, the original AIC value was 1056708. In the first regression, we removed the Variable Asian and the AIC value changed to 1056706, which is consistent with our judgment based on the figure above.so we need to delete Asian variable and Amindian varible. Next, we will analyze the impact of each variable on wages. AGE~INCOME WAGE

NNobs <- length(INCWAGE)
set.seed(00000)
graph_obs <- (runif(NNobs) < 0.1) 
dat_graph <-subset(dat_use,graph_obs)
plot(INCWAGE ~ jitter(AGE, factor = 2), pch = 19, col = rgb(0.3, 0.4, 0.5, alpha = 0.8), ylim = c(0,150000), data = dat_graph)
to_be_predicted2 <- data.frame(AGE = 25:55, female = 1, AfAm = 0, Asian = 0, Amindian = 1, race_oth = 1, Hispanic = 1, educ_hs = 0, educ_somecoll = 0, educ_college = 1, educ_advdeg = 0)
to_be_predicted2$yhat <- predict(model_temp1, newdata = to_be_predicted2)
lines(yhat ~ AGE, data = to_be_predicted2, col="red",cex=2,lwd=2,lty=2)

nage<-as.numeric(as.character(dat_use$INCWAGE))
par(mfrow=c(2,2))
Wage_age<-lm(INCWAGE~AGE)
plot(Wage_age,col="pink",pch=20,cex=1,lwd=1,lty=1)

summary(Wage_age)
## 
## Call:
## lm(formula = INCWAGE ~ AGE)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -86853 -38985 -18021  12041 580882 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  32338.6     1747.3   18.51   <2e-16 ***
## AGE            991.2       42.1   23.54   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 82760 on 46969 degrees of freedom
## Multiple R-squared:  0.01166,    Adjusted R-squared:  0.01164 
## F-statistic: 554.2 on 1 and 46969 DF,  p-value: < 2.2e-16

As can be seen from the figure, age has positively correlated with income wage. From the age of 25 to 55, the income increases with the increase of age. In addition, the income of different age groups is concentrated between 3000 and 10000, and the number of people with income reaching 150,000 increases from the age of 40.

To improve the accuracy, we performed polynomial regression on the independent variable of age.

nage<-as.numeric(as.character(dat_use$INCWAGE))
par(mfrow=c(2,2))
Wage_age2<-lm(INCWAGE~AGE+I(AGE^2),data=dat_use)
plot(Wage_age2,col="pink",pch=20,cex=1,lwd=1,lty=1)

summary(Wage_age2)
## 
## Call:
## lm(formula = INCWAGE ~ AGE + I(AGE^2), data = dat_use)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -82216 -39552 -17216  12401 593941 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.004e+05  8.440e+03  -11.90   <2e-16 ***
## AGE          7.932e+03  4.338e+02   18.28   <2e-16 ***
## I(AGE^2)    -8.612e+01  5.358e+00  -16.07   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 82530 on 46968 degrees of freedom
## Multiple R-squared:  0.01707,    Adjusted R-squared:  0.01703 
## F-statistic: 407.8 on 2 and 46968 DF,  p-value: < 2.2e-16

After polynomial regression, we can see that the accuracy has been improved, indicating that age has a great influence on salary.

female~INCOME WAGE

str(female)
##  num [1:46971] 0 1 0 1 1 0 0 0 1 0 ...
model_temp2 <- lm(INCWAGE ~ female, data = dat_use)
par(mfrow=c(2,2))
plot(model_temp2, col="gray")

summary(model_temp2)
## 
## Call:
## lm(formula = INCWAGE ~ female, data = dat_use)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -80964 -39212 -18212  12788 575788 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  80963.9      515.9  156.95   <2e-16 ***
## female      -18752.3      766.8  -24.45   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 82720 on 46969 degrees of freedom
## Multiple R-squared:  0.01257,    Adjusted R-squared:  0.01255 
## F-statistic:   598 on 1 and 46969 DF,  p-value: < 2.2e-16

F test:p-value: < 2.2e-16, There is a relationship between gender and income.

SEX_f <- factor(dat_use$SEX)
summary(SEX_f)
##   Male Female 
##  25713  21258
boxplot(dat_use$INCWAGE~dat_use$SEX,data=dat_use,col="red",main="wage~sex",xlab="income",ylab="sex",horizontal = TRUE)

we can see that the mean of male’s wage greater than the mean of female’s wage,and for the number of max wage,male are more than the female.In high-income groups, men exceed women.

model_temp3 <- lm(INCWAGE ~educ_advdeg, data = dat_use)
par(mfrow=c(2,2))
plot(model_temp3,col="green")

summary(model_temp3)
## 
## Call:
## lm(formula = INCWAGE ~ educ_advdeg, data = dat_use)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -113149  -36801  -16801   12199  576199 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  61801.0      417.8  147.91   <2e-16 ***
## educ_advdeg  51347.7      916.4   56.03   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 80600 on 46969 degrees of freedom
## Multiple R-squared:  0.06266,    Adjusted R-squared:  0.06264 
## F-statistic:  3140 on 1 and 46969 DF,  p-value: < 2.2e-16
model_temp4 <- lm(INCWAGE ~educ_advdeg, data = dat_use)
par(mfrow=c(2,2))
plot(model_temp4,col="green")

summary(model_temp4)
## 
## Call:
## lm(formula = INCWAGE ~ educ_advdeg, data = dat_use)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -113149  -36801  -16801   12199  576199 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  61801.0      417.8  147.91   <2e-16 ***
## educ_advdeg  51347.7      916.4   56.03   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 80600 on 46969 degrees of freedom
## Multiple R-squared:  0.06266,    Adjusted R-squared:  0.06264 
## F-statistic:  3140 on 1 and 46969 DF,  p-value: < 2.2e-16
model_temp5 <- lm(INCWAGE ~educ_advdeg+I(educ_advdeg^2), data = dat_use)
par(mfrow=c(2,2))
plot(model_temp5,col="green")

summary(model_temp5)
## 
## Call:
## lm(formula = INCWAGE ~ educ_advdeg + I(educ_advdeg^2), data = dat_use)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -113149  -36801  -16801   12199  576199 
## 
## Coefficients: (1 not defined because of singularities)
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       61801.0      417.8  147.91   <2e-16 ***
## educ_advdeg       51347.7      916.4   56.03   <2e-16 ***
## I(educ_advdeg^2)       NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 80600 on 46969 degrees of freedom
## Multiple R-squared:  0.06266,    Adjusted R-squared:  0.06264 
## F-statistic:  3140 on 1 and 46969 DF,  p-value: < 2.2e-16
new.model<-lm(INCWAGE ~AGE+ female + educ_somecoll + educ_college + educ_advdeg,data = dat_use)
par(mfrow=c(2,2))
plot(new.model,col="gold",pch=20,cex=1,lwd=1,lty=1)

require(stargazer)
stargazer(new.model, type = "text")
## 
## ================================================
##                         Dependent variable:     
##                     ----------------------------
##                               INCWAGE           
## ------------------------------------------------
## AGE                         1,343.870***        
##                               (39.636)          
##                                                 
## female                     -25,581.830***       
##                              (719.294)          
##                                                 
## educ_somecoll              14,406.760***        
##                             (1,010.572)         
##                                                 
## educ_college               49,929.340***        
##                              (946.274)          
##                                                 
## educ_advdeg                75,991.160***        
##                             (1,020.901)         
##                                                 
## Constant                    -3,048.084*         
##                             (1,809.962)         
##                                                 
## ------------------------------------------------
## Observations                   46,971           
## R2                             0.146            
## Adjusted R2                    0.146            
## Residual Std. Error   76,936.290 (df = 46965)   
## F Statistic         1,604.883*** (df = 5; 46965)
## ================================================
## Note:                *p<0.1; **p<0.05; ***p<0.01
lm1<-lm(INCWAGE ~AGE+ female + educ_somecoll + educ_college + educ_advdeg,data = dat_use)
summary(lm1)
## 
## Call:
## lm(formula = INCWAGE ~ AGE + female + educ_somecoll + educ_college + 
##     educ_advdeg, data = dat_use)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -146856  -33301  -10864   13016  618251 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -3048.08    1809.96  -1.684   0.0922 .  
## AGE             1343.87      39.64  33.905   <2e-16 ***
## female        -25581.84     719.29 -35.565   <2e-16 ***
## educ_somecoll  14406.76    1010.57  14.256   <2e-16 ***
## educ_college   49929.34     946.27  52.764   <2e-16 ***
## educ_advdeg    75991.16    1020.90  74.435   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 76940 on 46965 degrees of freedom
## Multiple R-squared:  0.1459, Adjusted R-squared:  0.1458 
## F-statistic:  1605 on 5 and 46965 DF,  p-value: < 2.2e-16

y=-3048.08+1343.87X1-25581.84X2+14406.76X3+49929.34X4+75991.16X5

par(mfrow=c(2,2))
plot(lm1,col="orange",pch=5,cex=1)

lm2<-update(lm1,.~. -female)
summary(lm2)
## 
## Call:
## lm(formula = INCWAGE ~ AGE + educ_somecoll + educ_college + educ_advdeg, 
##     data = dat_use)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -132859  -33770  -12018   11591  614027 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -12161.04    1815.68  -6.698 2.14e-11 ***
## AGE             1338.31      40.17  33.320  < 2e-16 ***
## educ_somecoll  11922.21    1021.63  11.670  < 2e-16 ***
## educ_college   47168.94     955.69  49.356  < 2e-16 ***
## educ_advdeg    71413.33    1026.29  69.584  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 77960 on 46966 degrees of freedom
## Multiple R-squared:  0.1229, Adjusted R-squared:  0.1228 
## F-statistic:  1646 on 4 and 46966 DF,  p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(lm2,col="orange")

lm3<-update(lm1,.~. -female-AGE)
summary(lm3)
## 
## Call:
## lm(formula = INCWAGE ~ educ_somecoll + educ_college + educ_advdeg, 
##     data = dat_use)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -113149  -34149  -14303   11697  593697 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    44303.4      659.5  67.176   <2e-16 ***
## educ_somecoll  10210.3     1032.3   9.891   <2e-16 ***
## educ_college   42215.1      955.1  44.198   <2e-16 ***
## educ_advdeg    68845.3     1035.4  66.491   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 78880 on 46967 degrees of freedom
## Multiple R-squared:  0.1022, Adjusted R-squared:  0.1021 
## F-statistic:  1782 on 3 and 46967 DF,  p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(lm3,col="orange")

lm4<-update(lm1,.~. +female*AGE)
summary(lm4)
## 
## Call:
## lm(formula = INCWAGE ~ AGE + female + educ_somecoll + educ_college + 
##     educ_advdeg + AGE:female, data = dat_use)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -150330  -33605  -10534   12810  616856 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -13228.61    2325.45  -5.689 1.29e-08 ***
## AGE             1596.66      53.72  29.722  < 2e-16 ***
## female         -3407.80    3262.77  -1.044    0.296    
## educ_somecoll  14473.40    1010.11  14.329  < 2e-16 ***
## educ_college   49755.79     946.12  52.589  < 2e-16 ***
## educ_advdeg    75742.26    1021.01  74.184  < 2e-16 ***
## AGE:female      -547.20      78.54  -6.967 3.27e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 76900 on 46964 degrees of freedom
## Multiple R-squared:  0.1468, Adjusted R-squared:  0.1467 
## F-statistic:  1347 on 6 and 46964 DF,  p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(lm4,col="orange")

new<-data.frame(female=0,AGE = 35,educ_somecoll = 0, educ_college = 1, educ_advdeg = 0)
lm.pred<-predict(lm1,new,interval="prediction",level=0.95)
lm.pred
##        fit    lwr      upr
## 1 93916.72 -56887 244720.5

we can predict a man who has a college educational level, at age 35, his income interval=(-56887,244720.5) at the 95% confidence level.

In conclusion. Age, gender and education background are both factors affecting income, among which the average income of males is greater than that of females. Moreover, in high-income groups, the number of males is greater than that of females. For education background, the higher education background is, the higher income is, and the income of people with a graduate degree is much higher than that of people with a high school education.

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.